In this paper, we propose a novel implementation for solving the large-scale k-medoids clustering problem. Conversely to the most famous k-means, k-medoids suffers from a computationally intensive phase for medoids evaluation, whose complexity is quadratic in space and time; thus solving this task for large datasets and, specifically, for large clusters might be unfeasible. In order to overcome this problem, we propose two alternatives for medoids update, one exact method and one approximate method: the former based on solving, in a distributed fashion, the quadratic medoid update problem; the latter based on a scan and replacement procedure. We implemented and tested our approach using the Apache Spark framework for parallel and distributed processing on several datasets of increasing dimensions, both in terms of patterns and dimensionality, and computational results show that both approaches are efficient and effective, able to converge to the same solutions provided by state-of-the-art k-medoids implementations and, at the same time, able to scale very well as the dataset size and/or number of working units increase.

Efficient approaches for solving the large-scale k-medoids problem / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - STAMPA. - (2017), pp. 338-347. (Intervento presentato al convegno 9th International Joint Conference on Computational Intelligence – IJCCI 2017 tenutosi a Funchal, Madeira - Portugal nel 1-3 november 2017) [10.5220/0006515003380347].

Efficient approaches for solving the large-scale k-medoids problem

Martino, Alessio
;
Rizzi, Antonello;Frattale Mascioli, Fabio Massimo
2017

Abstract

In this paper, we propose a novel implementation for solving the large-scale k-medoids clustering problem. Conversely to the most famous k-means, k-medoids suffers from a computationally intensive phase for medoids evaluation, whose complexity is quadratic in space and time; thus solving this task for large datasets and, specifically, for large clusters might be unfeasible. In order to overcome this problem, we propose two alternatives for medoids update, one exact method and one approximate method: the former based on solving, in a distributed fashion, the quadratic medoid update problem; the latter based on a scan and replacement procedure. We implemented and tested our approach using the Apache Spark framework for parallel and distributed processing on several datasets of increasing dimensions, both in terms of patterns and dimensionality, and computational results show that both approaches are efficient and effective, able to converge to the same solutions provided by state-of-the-art k-medoids implementations and, at the same time, able to scale very well as the dataset size and/or number of working units increase.
2017
9th International Joint Conference on Computational Intelligence – IJCCI 2017
cluster analysis; parallel and distributed computing; large-scale pattern recognition; unsupervised learning; big data mining
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Efficient approaches for solving the large-scale k-medoids problem / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - STAMPA. - (2017), pp. 338-347. (Intervento presentato al convegno 9th International Joint Conference on Computational Intelligence – IJCCI 2017 tenutosi a Funchal, Madeira - Portugal nel 1-3 november 2017) [10.5220/0006515003380347].
File allegati a questo prodotto
File Dimensione Formato  
Martino_Efficient_2018.pdf

solo gestori archivio

Note: Efficients Approaches for Solving the Large-Scale k-medoids Problem
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 402.75 kB
Formato Adobe PDF
402.75 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1033136
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 17
social impact